Search CORE

8 research outputs found

ALOJA: A framework for benchmarking and predictive analytics in Hadoop deployments

Author: Berral García Josep Lluís
Call Aaron
Carrera Pérez David
Green Daron
Poggi Mastrokalo Nicolas
Reinauer Rob
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.Peer ReviewedPostprint (published version

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Business process mining from e-commerce web logs

Author: Carrera Pérez David
Khalaf Rania
Muthusamy Vinod
Poggi Mastrokalo Nicolas
Publication venue: Springer
Publication date
Field of study

The dynamic nature of the Web and its increasing importance as an economic platform create the need of new methods and tools for business efficiency. Current Web analytic tools do not provide the necessary abstracted view of the underlying customer processes and critical paths of site visitor behavior. Such information can offer insights for businesses to react effectively and efficiently. We propose applying Business Process Management (BPM) methodologies to e-commerce Website logs, and present the challenges, results and potential benefits of such an approach. We use the Business Process Insight (BPI) platform, a collaborative process intelligence toolset that implements the discovery of loosely-coupled processes, and includes novel process mining techniques suitable for the Web. Experiments are performed on custom click-stream logs from a large online travel and booking agency. We first compare Web clicks and BPM events, and then present a methodology to classify and transform URLs into events. We evaluate traditional and custom process mining algorithms to extract business models from real-life Web data. The resulting models present an abstracted view of the relation between pages, exit points, and critical paths taken by customers. Such models show important improvements and aid high-level decision making and optimization of e-commerce sites compared to current state-of-art Web analytics.Peer Reviewe

RECERCAT

ALOJA: A framework for benchmarking and predictive analytics in Hadoop deployments

Author: Berral García Josep Lluís
Call Aaron
Carrera Pérez David
Green Daron
Poggi Mastrokalo Nicolas
Reinauer Rob
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

RECERCAT

Adaptive distributed mechanism againts flooding network attacks based on machine learning

Author: Alonso López Javier
Berral García Josep Lluís
Gavaldà Mestre Ricard
Parashar Manish
Poggi Mastrokalo Nicolas
Torres Viñals Jordi
Publication venue: ACM Press, NY
Publication date
Field of study

Adaptive techniques based on machine learning and data mining are gaining relevance in self-management and self- defense for networks and distributed systems. In this paper, we focus on early detection and stopping of distributed flooding attacks and network abuses. We extend the framework proposed by Zhang and Parashar (2006) to cooperatively detect and react to abnormal behaviors before the target machine collapses and network performance degrades. In this framework, nodes in an intermediate network share infor- mation about their local traffic observations, improving their global traffic perspective. In our proposal, we add to each node the ability of learning independently, therefore reacting differently according to its situation in the network and local traffic conditions. In particular, this frees the administrator from having to guess and manually set the parameters distinguishing attacks from non-attacks: now such thresholds are learned and set from experience or past data. We expect that our framework provides a faster detection and more accuracy in front of distributed ooding attacks than if static filters or single-machine adaptive mechanisms are used. We show simulations where indeed we observe a high rate of stopped attacks with minimum disturbance to the legitimate users

RECERCAT

Adaptive distributed mechanism againts flooding network attacks based on machine learning

Author: Alonso López Javier
Berral García Josep Lluís
Gavaldà Mestre Ricard
Parashar Manish
Poggi Mastrokalo Nicolas
Torres Viñals Jordi
Publication venue: ACM Press, NY
Publication date: 01/01/2008
Field of study

Adaptive distributed mechanism againts flooding network attacks based on machine learning

Author: Alonso López Javier
Berral García Josep Lluís
Gavaldà Mestre Ricard
Parashar Manish
Poggi Mastrokalo Nicolas
Torres Viñals Jordi
Publication venue: ACM Press, NY
Publication date: 01/01/2008
Field of study

UPCommons. Portal del coneixement obert de la UPC

Tests on the heating of breather devices by propane-air and hydrogen-air flames

Author: Ayguadé Parra Eduard
Becerra Fontal Yolanda
Blakeley Jose
Call Aaron
Carrera Pérez David
Gagliardi Fabrizio
Green Daron
Labarta Mancho Jesús José
Mendoza Sergio
Poggi Mastrokalo Nicolas
Reinauer Rob
Torres Viñals Jordi
Vujic Nikola
Publication venue
Publication date: 01/01/1983
Field of study

SIGLEAvailable from British Library Lending Division - LD:4274.84725(IR/L/FL--83/24) / BLDSC - British Library Document Supply CentreGBUnited Kingdo

UPCommons. Portal del coneixement obert de la UPC

OpenGrey Repository

ALOJA: a systematic study of Hadoop deployment variables to enable automated characterization of cost-effectiveness

Author: Ayguadé Parra Eduard
Becerra Fontal Yolanda
Blakeley Jose
Call Aaron
Carrera Pérez David
Gagliardi Fabrizio
Green Daron
Labarta Mancho Jesús José
Mendoza Sergio
Poggi Mastrokalo Nicolas
Reinauer Rob
Torres Viñals Jordi
Vujic Nikola
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

This article presents the ALOJA project, an initiative to produce mechanisms for an automated characterization of cost-effectiveness of Hadoop deployments and reports its initial results. ALOJA is the latest phase of a long-term collaborative engagement between BSC and Microsoft which, over the past 6 years has explored a range of different aspects of computing systems, software technologies and performance profiling. While during the last 5 years, Hadoop has become the de-facto platform for Big Data deployments, still little is understood of how the different layers of the software and hardware deployment options affects its performance. Early ALOJA results show that Hadoop's runtime performance, and therefore its price, are critically affected by relatively simple software and hardware configuration choices e.g., number of mappers, compression, or volume configuration. Project ALOJA presents a vendor-neutral repository featuring over 5000 Hadoop runs, a test bed, and tools to evaluate the cost-effectiveness of different hardware, parameter tuning, and Cloud services for Hadoop. As few organizations have the time or performance profiling expertise, we expect our growing repository will benefit Hadoop customers to meet their Big Data application needs. ALOJA seeks to provide both knowledge and an online service to with which users make better informed configuration choices for their Hadoop compute infrastructure whether this be on-premise or cloud-based. The initial version of ALOJA's Web application and sources are available at http://hadoop.bsc.es.This work is partially supported by the Ministry of Science and Technology of Spain under contracts TIN2012-34557 and 2014SGR1051.Peer Reviewe

RECERCAT